300 research outputs found

    Learning from Scarce Experience

    Full text link
    Searching the space of policies directly for the optimal policy has been one popular method for solving partially observable reinforcement learning problems. Typically, with each change of the target policy, its value is estimated from the results of following that very policy. This requires a large number of interactions with the environment as different polices are considered. We present a family of algorithms based on likelihood ratio estimation that use data gathered when executing one policy (or collection of policies) to estimate the value of a different policy. The algorithms combine estimation and optimization stages. The former utilizes experience to build a non-parametric representation of an optimized function. The latter performs optimization on this estimate. We show positive empirical results and provide the sample complexity bound.Comment: 8 pages 4 figure

    Policy Improvement for POMDPs Using Normalized Importance Sampling

    Get PDF
    We present a new method for estimating the expected return of a POMDP from experience. The estimator does not assume any knowle ge of the POMDP and allows the experience to be gathered with an arbitrary set of policies. The return is estimated for any new policy of the POMDP. We motivate the estimator from function-approximation and importance sampling points-of-view and derive its theoretical properties. Although the estimator is biased, it has low variance and the bias is often irrelevant when the estimator is used for pair-wise comparisons.We conclude by extending the estimator to policies with memory and compare its performance in a greedy search algorithm to the REINFORCE algorithm showing an order of magnitude reduction in the number of trials required

    An Electronic Market-Maker

    Get PDF
    This paper presents an adaptive learning model for market-making under the reinforcement learning framework. Reinforcement learning is a learning technique in which agents aim to maximize the long-term accumulated rewards. No knowledge of the market environment, such as the order arrival or price process, is assumed. Instead, the agent learns from real-time market experience and develops explicit market-making strategies, achieving multiple objectives including the maximizing of profits and minimization of the bid-ask spread. The simulation results show initial success in bringing learning techniques to building market-making algorithms

    Search for gamma-ray emission from pp-wave dark matter annihilation in the Galactic Center

    Full text link
    Indirect searches for dark matter through Standard Model products of its annihilation generally assume a cross-section which is dominated by a term independent of velocity (ss-wave annihilation). However, in many DM models an ss-wave annihilation cross-section is absent or helicity suppressed. To reproduce the correct DM relic density in these models, the leading term in the cross section is proportional to the DM velocity squared (pp-wave annihilation). Indirect detection of such pp-wave DM is difficult because the average velocities of DM in galaxies today are orders of magnitude slower than the DM velocity at the time of decoupling from the primordial thermal plasma, suppressing the annihilation cross-section today by some five orders of magnitude relative to its value at freeze out. Thus pp-wave DM is out of reach of traditional searches for DM annihilations in the Galactic halo. Near the region of influence of a central supermassive black hole, such as Sgr A∗^*, however, DM can form a localized over-density known as a `spike'. In such spikes the DM is predicted to be both concentrated in space and accelerated to higher velocities, allowing the γ\gamma-ray signature from its annihilation to potentially be detectable above the background. We use the FermiFermi Large Area Telescope to search for the γ\gamma-ray signature of pp-wave annihilating DM from a spike around Sgr A∗^* in the energy range 10 GeV-600 GeV. Such a signal would appear as a point source and would have a sharp line or box-like spectral features difficult to mimic with standard astrophysical processes, indicating a DM origin. We find no significant excess of γ\gamma rays in this range, and we place upper limits on the flux in γ\gamma-ray boxes originating from the Galactic Center. This result, the first of its kind, is interpreted in the context of different models of the DM density near Sgr A∗^*.Comment: 16 pages, 7 figure
    • …
    corecore